-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pre-allocate storage for metadata json files, see containers/podman#13967 #1480
base: main
Are you sure you want to change the base?
Conversation
…3967 Keeping a temporary file of at least the same size as the target file for atomic writes helps reduce the probability of running out of space when deleting entities from corresponding metadata in a disk full scenario. This strategy is applied to writing containers.json, layers.json, images.json and mountpoints.json. Signed-off-by: Denys Knertser <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial impression is that this way too much complexity to handle a desperate situation that could be much more reliably handled by something like deleting a file in /var/log
. There well may be several other places that require extra space to free large objects, and without a specific feature test to ensure that this keeps working, we are quite likely to add more of them without ever noticing.
I guess all of this code is likely to more frequently break something in unexpected / untested situations, than to help in situations it is intended for. E.g. the non-Linux path is almost certainly not going to be sufficiently tested, and the behavior difference is likely to be missed by future readers. Admittedly that is a very vague and completely unquantified feeling.
So this is not quite a “no” … but it is almost one.
random := "" | ||
// need predictable name when pre-allocated | ||
if !opts.PreAllocate { | ||
random = strconv.FormatUint(rand.Uint64(), 36) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a sufficient replacement for CreateTemp
. I’m not immediately sure if this is called from contexts where that matters.
@@ -17,6 +19,9 @@ type AtomicFileWriterOptions struct { | |||
// On successful return from Close() this is set to the mtime of the | |||
// newly written file. | |||
ModTime time.Time | |||
// Whenever an atomic file is successfully written create a temporary | |||
// file of the same size in order to pre-allocate storage for the next write | |||
PreAllocate bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PreAllocate
and !PreAllocate
code paths differ so much, and very importantly they have different security implications (PreAllocate
must not be used in directories with hostile writers, like */tmp
), that I don’t think it makes much sense for them to use the same object at all.
The comparatively small parts of the code that can be shared between the two can be shared another way (e.g. using a shared helper function).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PreAllocate
must not be used in directories with hostile writers
Not just that, PreAllocate
is also only correct if there is some external-to-pkg/ioutils
locking that ensures serialization of uses of AtomicFileWriter
. (That’s fine for the container/image/layer stores, we have locks there. But it’s a new responsibility for callers of this code, and it needs to be very clearly documented.)
I can totally see your point regarding the complexity. However, I do think that this is a useful feature to be able to remove specific containers/images once the disk is full, rather than the discussed alternative to perform a system reset dropping all containers with their state, volumes etc. It might not matter that much in a desktop environment (and could be fixed by freeing some less important files, as suggested, because the containers storage would often be located on a root partition), but in a production environment it is desirable to be able to limit the amount of affected services. From what I have been able to gather so far Podman needs additional space on Moreover, an atomic write of the *.json-files as it is implemented now requires roughly as much free space as the size of the original file which can easily go up to 100s of kBs even with a handful of containers. This means that the space requirements for such operation are much larger than a Bolt DB update meaning the likelihood of encountering an out of disk situation when the disk is low is much higher on metadata *.json atomic writes than updating the BoltDB. By keeping a temporary file of the same size around we can guarantee that an atomic write of a smaller file (which is the case with So, while I agree that one would try to avoid the complexity, I think it might be justified if it allows to solve the actual problem. Running out of disk space might not be a very often encountered scenario, but when it happens it can be very frustrating. |
I agree with @mtrmac. There are so many other places where it could go wrong that I am not sure it is worth to add all this complexity that is difficult to test just for one instance. |
Or maybe … if we are serious about this, there really needs to be an end-to-end Podman tests that ensures the actually desired functionality (removing a container when completely out of disk space) keeps working. (I realize that this is a bit circular — we need this PR, and maybe other changes, before that test can be added. So, that means either doing all that work, and then discussing whether it is worth maintaining it, or discussing that now, with some hypothetical guess at how much work that is.) |
This change seems to fix the scenario described in containers/podman#17198
I would appreciate advice on writing a testcase though, I would imagine that
delete_container
testcase would be a good starting point, however I lack the knowledge of the test environment and don't know if it is feasible to fill the filesystem or initialize a new filesystem and mount it.